BMC Genomic Data — Latest Matching Preprints

1

Backup transcription factor binding sites protect human genes from mutations in the promoter

Brown, J. C.

2023-01-27 molecular biology 10.1101/2023.01.27.525856 medRxiv

Top 0.1%

12.1%

Show abstract

This study was designed to test the idea that human gene promoters have evolved to be resistant to the effects of mutations in their primary function, the control of gene expression. It is proposed that the transcription factor/transcription factor binding site (TF/TFBS) pair having the greatest effect on control of a gene is the one with the highest abundance in the promoter. Other pairs would have the same effect on gene expression and would predominate in the event of a mutation in the most abundant pair. It is expected that the overall promoter architecture proposed here will be highly resistant to mutagenic change that would otherwise affect expression of the gene. The idea was tested beginning with a database of 42 human genes highly specific for expression in brain. For each gene, information was accumulated about its expression level and about the TFBS occupancy of the five most abundant TF/TFBS pairs. Expression level was then plotted against TFBS occupancy separately for each of the five pairs, and the plots were compared with each other. The plots were found to be similar, and the results were interpreted to indicate that the TFBS occupancy ranks evolved to yield the same effect on gene expression level with multiple ranks able to function in the event of mutation in another. A similar analysis was conducted with a database of 31 human liver specific genes, and the overall result was found to be the same. Backup TFBS occupancy ranks were interpreted to be present in both brain and liver specific genes. Finally, the TFBSs in the brain specific and liver specific gene populations were compared with each other with the goal of identifying any brain selective or liver selective TFBSs. Of the 89 TFBSs in the pooled population, 58 were found only in brain specific but not liver specific genes, 8 only in liver specific but not brain specific genes and 23 were found in both brain and liver specific genes. The results were interpreted to emphasize the large number of TFBS in brain specific but not liver specific genes.

2

Human PRE-PIK3C2B exhibits long-range intra- and inter-chromosomal interactions with genomic regions enriched in repressive marks.

Maini, J.; Pathak, A. K.; Bhattacharyya, K.; Kumar, N.; Narang, A.; Jain, N.; Singh, I.; Dhingra, V.; Brahmachari, V.

2020-11-11 genetics 10.1101/2020.11.11.378745 medRxiv

Top 0.1%

11.9%

Show abstract

Human PRE-PIK3C2B is a dual nature polycomb response element that interacts with both polycomb as well as trithorax members. In the current study, using 4C-Seq (Capturing Circular Chromosomal Conformation-Sequencing), we identified long-range chromatin interactions associated with PRE-PIK3C2B and validated them with 3C-PCR. We identified both intra-as well as inter-chromosomal interactions, a large proportion of which were found to be closely distributed around transcriptional start sites (TSS). A significant number of interactions were also found to be associated with heterochromatic regions. Meta-analysis of ENCODE ChIP-Seq data identified an overall enrichment of YY1, CTCF as well as histone modification such as H3K4me3 and H3K27me marks in different cell lines. Almost 90% interactions were derived from either intronic or intergenic regions. among which large proportions of intronic interactors were either unique sequences or LINE/SINE derived. In case of intergenic interactions, majority of the interaction were associated with LINE/SINE repeats. We further found that genes proximal to the interactor sequences were co-expressed, they showed reduced expression. To the best of our knowledge this is one of the early demonstrations of long-range interaction of PRE sequences in human genome.

3

Phyloepigenetics in phylogeny analyses

Santourlidis, S.

2024-08-18 evolutionary biology 10.1101/2024.08.14.607911 medRxiv

Top 0.1%

7.8%

Show abstract

Long-standing, continuous blurring and controversies in the field of phylogenetic interspecies relations, associated with insufficient explanations for dynamics and variability of speeds of evolution in mammals, hint to a crucial missing link. It has been suggested that transgenerational epigenetic inheritance and the concealed mechanisms behind play a distinct role in mammalian evolution. Here, a comprehensive sequence alignment approach in hominid species, i.e., Homo sapiens, Homo neanderthalensis, denisovan human, Pan troglodytes, Pan paniscus, Gorilla gorilla and Pongo pygmaeus, comprising conserved CpG islands of housekeeping genes, uncover evidence for a distinct variability of CpG dinucleotides. Applying solely these evolutionary consistent and inconsistent CpG sites in a classic phylogenetic analysis, calibrated by the divergence time point of the common chimpanzee (Pan troglodytes) and the bonobo or pygmy chimpanzee (Pan paniscus), a "phylo-epigenetic" tree has been generated which precisely recapitulates branch points and branch lengths, i.e., divergence events and relations, as they have been broadly suggested in the current literature, based on comprehensive molecular phylogenomics and fossil records. I suggest here that CpG dinucleotides changes at CpG islands are of superior importance for evolutionary development and determine the emerging DNA methylation profiles.

4

Identification And Characterization of Differentially Co-Expressed Head-To-Head Gene Pairs in Pancreatic Cancer

Wang, C.; Ren, C.

2023-10-15 genetic and genomic medicine 10.1101/2023.10.14.23297044 medRxiv

Top 0.1%

7.2%

Show abstract

Head-to-head (H2H) gene pairs are an evolutionarily conserved genomic configuration (sharing a bidirectional promoter) with implications in cancer. This study investigates the transcriptional mechanisms of H2H genes in pancreatic ductal adenocarcinoma (PDAC). We found that H2H pairs involving housekeeping and tumor suppressor genes maintained a stabler expression correlation, while most pairs had reduced co-expression. Differential co-expression analysis revealed 15 H2H gene pairs significantly altered in PDAC. The gene pairs RPL7-RDH10 and STAC-RNF38 showed both high transcriptional factor similarity and differential co-expression, and enrichment analysis highlighted FOXC1 and YY1 as key regulators of altered H2H pairs. Our characterization of H2H transcriptional patterns in PDAC provides insights into bidirectional promoter disruption. The identified H2H gene signatures offer paired biomarker potential for improved PDAC detection. Further study of mechanisms maintaining H2H stability could spur new therapeutic approaches.

5

Mapping Chromatin Interactions of ZBP1 and ADAR Z-Alpha Domains: A ChIP-Seq Based Comparison

Hamrick, D.; Sharma, M.; Grow, E. J.

2024-12-01 genomics 10.1101/2024.11.29.626086 medRxiv

Top 0.1%

6.6%

Show abstract

The DNA double helix typically exists in the canonical B-form conformation, but this structure often can adopt the unique alternative form known as Z-DNA. In Z-DNA, the DNA helix winds to the left in a zigzag pattern instead of the right-handed B-DNA form. Z-DNA is thought to play a key role in transcription, but it is unclear whether is a positive or negative regulator of RNA polymerase activity. Additionally, several studies have shown how Z-DNA contributes to DNA damage or genome instability. However, the precise role of Z-DNA in the genome remains unclear. To address this question, we mapped Z-DNA using a ChIP-Seq assay with two Z-DNA biosensors: Zaa-Zbp1, comprised of a dimerized Z-alpha Z-DNA binding domains from Z-DNA binding protein 1 (Zbp1), and Zaa-Adar1, comprised of dimerized Z-alpha domains from Adenosine deaminase acting on RNA 1 (Adar1). We found that these Zaa probes possessed similar binding profiles when analyzed with motif analysis, but gene ontology analysis revealed that these Z-alpha domains bound to heterogeneous genes, with Zaa-Zbp1 most strongly binding to genes in the RHOQ-GTPase pathway and Zaa-Adar1 binding to genes involved in the M phase of the cell cycle.

6

The Human Canonical Core Histone Catalogue

Susano Pinto, D. M.; Flaus, A.

2019-07-30 molecular biology 10.1101/720235 medRxiv

Top 0.1%

6.5%

Show abstract

Core histone proteins H2A, H2B, H3, and H4 are encoded by a large family of genes distributed across the human genome. Canonical core histones contribute the majority of proteins to bulk chromatin packaging, and are encoded in 4 clusters by 65 coding genes comprising 17 for H2A, 18 for H2B, 15 for H3, and 15 for H4, along with at least 17 total pseudogenes. The canonical core histone genes display coding variation that gives rise to 11 H2A, 15 H2B, 4 H3, and 2 H4 unique protein isoforms. Although histone proteins are highly conserved overall, these isoforms represent a surprising and seldom recognised variation with amino acid identity as low as 77% between canonical histone proteins of the same type. The gene sequence and protein isoform diversity also exceeds commonly used subtype designations such as H2A.1 and H3.1, and exists in parallel with the well-known specialisation of variant histone proteins. RNA sequencing of histone transcripts shows evidence for differential expression of histone genes but the functional significance of this variation has not yet been investigated. To assist understanding of the implications of histone gene and protein diversity we have catalogued the entire human canonical core histone gene and protein complement. In order to organise this information in a robust, accessible, and accurate form, we applied software build automation tools to dynamically generate the canonical core histone repertoire based on current genome annotations and then to organise the information into a manuscript format. Automatically generated values are shown with a light grey background. Alongside recognition of the encoded protein diversity, this has led to multiple corrections to human histone annotations, reflecting the flux of the human genome as it is updated and enriched in reference databases. This dynamic manuscript approach is inspired by the aims of reproducible research and can be readily adapted to other gene families.

7

Tracing the expansion of p53 retrogenes in elephant species: A foundation for functional insights.

Karakostis, K.; Campoy, E.; Puig, M.; Fahraeus, R.; Vollrath, F.; Caceres, M.

2026-02-26 evolutionary biology 10.64898/2026.02.25.707932 medRxiv

Top 0.1%

5.4%

Show abstract

Elephants have evolved multiple TP53 copies through a retrotransposition event followed by successive duplications. Some of these TP53 retrogenes (RTGs) are expressed and hypothesized to have functional roles in cellular regulation. However, comparative genomic studies on TP53 evolution and function are limited due to scarce genomic data for elephants and other afrotherians. Most existing research relies on scaffold assemblies of Loxodonta africana (LoxAfr3 and LoxAfr4), with some focus on Elephas maximus chromosomal assembly. In this in silico study, we analyzed three elephant genomes to validate TP53 RTGs, assess their copy variation, and trace their evolution. For the first time we describe 29 TP53 RTGs in E. maximus versus 18-19 in L. africana. These copies show sequence variation, especially in the duplicated regions and their flanking repetitive elements. Chromosomal mapping in E. maximus revealed that two major classes of TP53 RTGs are consistently arranged in pairs on chromosome 27, which harbours 27 of the 29 identified copies. The observed distribution strongly supports an evolutionary model in which large-scale genomic segments, each encompassing at least two retrogenes of different groups, were duplicated early in the elephant lineage, driving the extensive amplification of TP53 RTGs, as suggested also by the patterns of the flanking repetitive elements. The TP53 RTGs expansion was followed by a unique inversion on chromosome 27 that separates the duplication clusters. Thus, this study enhances our understanding of the elephants multi-p53 system, linked to cancer resistance, body size, and Petos paradox, and supports ongoing research into functional aspects. Significance StatementThis study uncovers the complex genomic architecture of elephant TP53 RTGs, which have diversified into two phylogenetically distinct types present in both African and Asian elephants. While both species share these 2 types, they differ substantially in copy number (18 in African versus 28 in Asian elephants), and in sequence variation, highlighting lineage-specific evolutionary trajectories. Detailed sequence analyses of the chromosomal organization of these copies in the Elephas maximus high-quality genome assembly, particularly the cluster on chromosome 27, indicates that they arose from stepwise duplications of extended genomic segments, which likely facilitated both the expansion and regulatory diversification of the TP53 RTGs repertoire. Comparative analysis with other mammals reveals an elephant-specific inversion that reorganized the expanded TP53 RTGs copies, creating a new genomic configuration that could have influenced the regulation and expression of some of the p53 RTGs. Therefore, this work advances our understanding of how evolutionary pressures shaped the landscape of TP53 RTGs and their flanking genetics elements in elephants. By identifying accurately the retrogene copy numbers, sequences and putative functional domains, it establishes a critical foundation for future studies investigating their functional roles in DNA damage and genome stability.

8

The genetic sequences prone to copy number variation and single nucleotide polymorphism are linked to the repair of the poisoned DNA topoisomerase II

Wu, J.; Jiang, C.; Ma, C.; Wang, D.; Liu, L.; Zhang, C.; Chen, F.

2020-09-03 genetics 10.1101/2020.09.03.280669 medRxiv

Top 0.1%

5.4%

Show abstract

TOP2-poisoning bioflavonoids and pesticides are linked to the copy number variation-related autism and chromosome translocation-related leukemia. On the other hand, the poisoned DNA topoisomerase II (TOP2) can lead to chromosome aberration. However, except a limited number of genes such as the MLL fusion, other poisoned TOP2-targeted genes, as well as their relationships with any specific diseases, are not defined. We applied the {gamma}H2A.X antibodies to genome-widely immunoprecipitate the chromatins that were associated with the repair of the TOP2 poison etoposide-induced DNA double strand breaks. We identified many transcriptable protein- and nonprotein-coding DNA sequences that are the candidates of or associated with many gene copy number variation- and/or single nucleotide polymorphism-associated diseases, including but not limited to microdeletion and microduplication syndromes (which are phenotypically presented as developmental, autistic, neurological, psychiatric, diabetic, autoimmune, and neoplastic diseases among many others) as well as stature, obesity, metabolic syndrome, hypertension, coronary artery disease, ischemic stroke, aortic aneurysm and dissection, leukemia, cancer, osteoporosis, Alzheimer disease, Parkinson disease, and Huntington disease. Our data raise the possibility that the poisoned TOP2 might be linked to the specific genetic alterations contributing to these diseases, additional to the known copy number variation-related autism and chromosome translocation-related leukemia. According to our and others data, we propose a model that may interpret the features, such as mosaicism, polygenic traits and pleiotropy, of these diseases. Author SummaryFor the past several decades, the morbidity rate of many diseases, including autism, mental disorders, cancer, cardiovascular diseases, diabetes, and senile dementia, has world-widely been rising. Analysis of the genome of the patients and their family members has identified the genes, whose alterations, so called copy number variation (CNV) and single nucleotide polymorphism (SNP), contribute to the diseases. Moreover, the CNVs and SNPs are de novo, that is, they have occurred only in the recent generations. Epidemiologically, this indicates that for the past several decades, there have existed some unknown world-wide etiologies to which human beings are exposed. If the etiologies are identified, avoiding humans exposure may reduce the morbidity of the diseases. We have found that the repair of the poisoned topoisomerase II involves many genes that contribute to the aforementioned diseases. As the topoisomerase II is known to be located at the genomic sites where the disease-associated CNVs occur, as the poisoned topoisomerase II is susceptible to chromosome aberration, and as the topoisomerase II poisons, such as dietary bioflavonoids, are widely distributed in the environment, our data raise the yet-to-be-confirmed possibility that the environmental topoisomerase II poisons might etiologically contribute to many CNV-associated diseases.

9

Origin and Evolution of DNA methyltransferases (DNMT) along the tree of life: A multi-genome survey.

Bhattacharyya, M.; De, S.; Chakrabarti, S.

2020-04-09 evolutionary biology 10.1101/2020.04.09.033167 medRxiv

Top 0.1%

5.2%

Show abstract

BackgroundCytosine methylation is a common DNA modification found in most eukaryotic organisms including plants, animals, and fungi. (Cytosine-5)-DNA methyltransferases (C5-DNA MTases) belong to the DNMT family of enzymes that catalyze the transfer of a methyl group from S-adenosyl methionine (SAM) to cytosine residues of DNA. In mammals, four members of the DNMT family have been reported: DNMT1, DNMT3a, DNMT3b and DNMT3L, but only DNMT1, DNMT3a and DNMT3b possess methyltransferase activity. There have been many reports about the methylation landscape in different organisms yet there is no systematic report of how the enzyme DNA (C5) methyltransferases have evolved in different organisms. ResultDNA methyltransferases are found to be present in all three domains of life. However, significant variability has been observed in length, copy number and sequence identity when compared across kingdoms. Sequence conservation is greatly increased in invertebrates and vertebrates compared to other groups. Similarly, sequence length has been found to be increased while domain lengths remain more or less conserved. Vertebrates are also found to be associated with more conserved DNMT domains. Finally, comparison between single nucleotide polymorphisms (SNPs) prevailing in human populations and evolutionary changes in DNMT vertebrate alignment revealed that most of the SNPs were conserved in vertebrates. ConclusionThe sequences (including the catalytic domain and motifs) and structure of the DNMT enzymes have been evolved greatly from bacteria to vertebrates with a steady increase in complexity and specificity. This study provides a systematic report of the evolution of DNA methyltransferase enzyme across different lineages of tree of life.

10

Why are GWASs limp about the X chromosome?

Gorlov, I. P.; Amos, C.

2022-10-16 evolutionary biology 10.1101/2022.10.11.511851 medRxiv

Top 0.1%

5.1%

Show abstract

The X-chromosome is among the largest human chromosomes. It differs from autosomes by a number of important features including hemizygosity in males, an almost complete inactivation of one copy in females, and unique patterns of recombination. We used data from the Catalog of Published Genome Wide Association Studies to compare densities of the GWAS-detected SNPs on the X-chromosome and autosomes. The density of GWAS-detected SNPs on the X-chromosome is 6-fold lower compared to the density of the GWAS-detected SNPs on autosomes. Differences between the X-chromosome and autosomes cannot be explained by differences in the overall SNP density, lower X-chromosome coverage by genotyping platforms or low call rate of X-chromosomal SNPs. Similar differences in the density of GWAS-detected SNPs were found in female-only GWASs (e.g. ovarian cancer GWASs). We hypothesized that the lower density of GWAS-detected SNPs on the X-chromosome compared to autosomes is not a result of a methodological bias, e.g. differences in coverage or call rates, but has a real underlying biological reason - a lower density of functional SNPs on the X-chromosome versus autosomes. This hypothesis is supported by the observation that (i) the overall SNP density of X-chromosome is lower compared to the SNP density on autosomes and that (ii) the density of genic SNPs on the X-chromosome is lower compared to autosomes while densities of intergenic SNPs are similar. Author summaryOne of the most striking observations from the Genome Wide Association Studies (GWAS) is that the density of GWAS hits is much lower on X-chromosome compared to autosomes. This was initially explained by technical/analytical reasons such as lower coverage and lack of adequate methods to analyze X-chromosomal SNPs. Since then, a better coverage and better analytical methods to analyze X-chromosomal SNPs were developed. We recently revisited the issue and found that the density of GWAS hits on X-chromosome is at least 5-fold lower compared to autosomes. We demonstrated that the difference cannot be explained by technical or analytical reasons. We proposed a hypothesis of a real biological phenomenon underlying X versus autosomal differences in the density of GWAS-detected SNPs, namely that X-chromosome has a lower density of functional polymorphisms compared to autosomes because of a stronger selection against X-chromosomal mutations since X-chromosomal variants are more exposed to natural selection due to hemizygosity in males and X-chromosome inactivation in females. The hypothesis is supported by the analysis of the densities of intergenic, intronic and exonic SNPs on human chromosomes.

11

Looking For Tumor Specific Transcription Factors. Study Of Promoters In Silico.

Kashkin, K. N.

2021-11-12 molecular biology 10.1101/2021.11.11.468214 medRxiv

Top 0.1%

4.8%

Show abstract

This study supplements earlier received experimental data using modern databases. Previously tumor-specific activity of several human native and chimeric promoters was demonstrated. Here we compared tumor-specific promoters with promoters of housekeeping genes by the presence of recognition profiles for transcription factors in DNA sequences of the promoters. A number of transcription factor recognition profiles have been identified, the presence of which in promoters may indicate the tumor specificity of the promoters. Transcription factors which may directly regulate promoters of genes involved in cell proliferation and carcinogenesis were revealed by pathway analysis. The results of the study may help in studying the peculiarities of gene transcription in tumors and in the search for or the creation of tumor-specific promoters for cancer gene therapy.

12

Joint Analysis Of Human Retroelements-Linked Histone Modification Profiles Reveals Quickly Evolving Molecular Processes Connected With Cancer

Nikitin, D.

2025-09-27 bioinformatics Community evaluation 10.1101/2025.09.24.677146 medRxiv

Top 0.1%

4.8%

Show abstract

Human retroelements (REs), which comprise approximately 40% of the genome, have played a pivotal role in the evolution of key molecular processes, such as placental development, by introducing novel regulatory elements near host gene promoters and enhancers. Despite their genomic abundance and regulatory influence, the functional trajectories of REs remain poorly understood. Here, leveraging ChIP-seq profiles of histone modifications (H3K4me1, H3K4me3, H3K9ac, H3K27ac, H3K27me3, and H3K9me3) from five human cell lines deposited in the ENCODE database, we systematically ranked the regulatory impact of REs across 25,075 human genes. Gene sets enriched for promoter- and enhancer-associated RE-linked regulatory sites were identified. Consensus gene sets across cell lines were found to be associated with pathways involved in cancer progression, specifically chronic myeloid leukemia and small cell lung cancer, as well as with host defense responses to infection with human T-cell lymphotropic virus type 1. These findings provide new insights into recent human evolution and highlight the ongoing influence of selfish genetic elements on genome regulation and disease susceptibility.

13

Identification of causal genes and mechanisms by which genetic variation mediates juvenile idiopathic arthritis susceptibility using functional genomics and CRISPR-Cas9

Frantzeskos, A.; Malysheva, V.; Shi, C.; Zhao, D.; Gupta, M.; Rossi, S.; Ding, J.; CLUSTER consortium, ; Thomson, W.; Eyre, S.; Bowes, J.; Spivakov, M.; Orozco, G.

2025-05-22 genetic and genomic medicine 10.1101/2025.05.22.25325739 medRxiv

Top 0.1%

4.7%

Show abstract

ObjectiveGenome-wide association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNPs) associated with juvenile idiopathic arthritis (JIA), the majority of which are located in non-coding regions such as enhancers. This presents a challenge for pinpointing causal variants and their target genes. Interpreting these loci requires functional genomics data from disease-relevant tissues, which has been lacking for JIA. This study seeks to fill that gap and elucidate the biological mechanisms underlying JIA susceptibility. MethodsWe performed low-input whole genome promoter Capture Hi-C (PCHi-C) and ATAC-seq on CD4+ T cells from three JIA oligoarthritis patients. To link JIA-associated SNPs to potential causal genes, we integrated PCHi-C data with JIA GWAS summary statistics using our Bayesian prioritisation algorithm, Capture Hi-C Omnibus Gene Score (COGS). ATAC-seq was used to further annotate JIA GWAS loci in CD4+ T cells. We then employed CRISPR activation and interference (CRISPRa/i) in Jurkat cells to validate the prioritised SNPs and their corresponding genes. ResultsChromatin interactions between JIA-associated SNPs and gene promoters were identified in 19 of 44 non-MHC JIA loci, linking 61 known and novel target genes to the disease. Through COGS, we prioritised seven putative causal genes for JIA: RGS14, ERAP2, HIPK1, CCR4, CCRL2, CCR2, and CCR3. SNPs within promoter-interacting regions (PIRs) of these genes were further validated using CRISPRa/i to confirm their roles in regulating gene expression. ConclusionsThis study provides insights into the genetic architecture of JIA by integrating genomic and epigenomic data, identifying disease-related genes, functionally validating risk SNPs, and highlighting candidate drugs for repurposing. Key messagesO_ST_ABSWhat is already known on this topicC_ST_ABSRecent genome-wide association studies in JIA have identified genetic loci associated with disease risk. However, the precise mechanisms by which these variants contribute to disease pathology remain unclear, as most do not directly alter protein-coding genes. It has been proposed that non-coding SNPs can affect genes that are important in disease through disruption of enhancer-mediated regulatory mechanisms that control their expression, with enhancers exerting their effects through chromatin interactions. Functional characterisation of risk loci is essential to delineate causal SNPs and target genes in JIA. What this study addsThis study is the first to utilise low-input Promoter Capture Hi-C to map long-range chromatin interactions in CD4+ T cells from JIA patients, alongside ATAC-seq to assess chromatin accessibility within the same samples. It identifies 61 potential target genes at JIA-associated loci and validates the regulatory roles of some of these through CRISPR activation and interference. This work enhances our understanding of how genetic variants modulate gene expression in immune cells, shedding light on key pathways involved in JIA pathogenesis. How this study might affect research, practice or policyHighlights new potential causal genes in JIA which can help understand the pathological mechanisms in JIA, and suggests the potential to repurpose CCR2/CCR5 inhibitors in JIA.

14

Rapid expansion of synaptic complexity as a key contributor to cognitive growth in early humans

Lepski, G.; Arevalo, A.; Silva de Camargo, P.; Nunes, K.; Barbosa Lemes, R.; Ferraz, T.; Strauss, A.; Miyagawa, S.

2025-04-15 evolutionary biology 10.1101/2025.04.09.648028 medRxiv

Top 0.1%

4.7%

Show abstract

Primates have been evolving for over 50 million years. At some point, humans made an unusually large evolutionary leap, giving rise to abilities like the creation of tools, intricate art and complex language. The neuronal synapse, a key player in information processing and brain plasticity, has largely been ignored as a potential factor in this process. Here we used the genomic databases of ancestral hominins to compare the expression levels of 995 genes expressed in the human nervous system among archaic (6 Neanderthal, 2 Denisovan) and modern humans (62 African Modern Human). We searched in the 95th top p-value for variants whose derived alleles had a frequency [≥]90% in modern and <10% in archaic humans. We then used the STRING database to perform protein-protein interaction networks on the 95th top p-value for the variants. We identified genetic variants in 15 genes, and in two (STX16 and UBASH3B), the allele frequency was significantly higher in modern versus archaic humans. These genes have previously been associated with critical cellular (proliferation, differentiation, migration, survival) and synaptic (exocytosis, synaptic vesicle fusion) processes, supporting the idea that changes in synaptic structure and function may have played a key role in the development of human cognition.

15

Identification of genetic variants regulating the abundance of clinically relevant plasma proteins using the Diversity Outbred mouse model

Philtjens, S.; Acri, D. J.; Kim, B.; Kim, H.; Kim, J.

2020-11-04 genetics 10.1101/2020.11.04.367938 medRxiv

Top 0.1%

4.3%

Show abstract

Although there have been numerous expression quantitative trait loci (eQTL) studies, the effect of genetic variants on the levels of multiple plasma proteins still warrants more systematic investigation. To identify genetic modifiers that influence the levels of clinically relevant plasma proteins, we performed protein quantitative trait locus (pQTL) mapping on 92 proteins using the Diversity Outbred (DO) mouse population and identified 12 significant cis and 6 trans pQTL. Among them, we discovered coding variants in a cis-pQTL in Ahr and a trans-pQTL in Rfx1 for the IL-17A protein. Our study reports an innovative pipeline for the identification of genetic modifiers that may be targeted for drug development. Author SummaryBlood plasma is a body fluid that can be collected in a noninvasive way to detect diseases, such as autoimmune disease. However, it is known that plasma protein levels are affected by both the environment and genetic background. To determine the effect of genetics on plasma protein levels in human, one needs a rather large sample size. To overcome this critical issue, a mouse model, the Diversity Outbred (DO), was established that is genetically as diverse as the human population. In this study, we used N=140 DO mice and genotyped over 140,000 variants. In addition, we measured the levels of 92 proteins in plasma of these DO mice using Olink Proteomics technology. The proteins detected in this panel are known to be detectable in human plasma, making our study translatable to human. We identified 18 significant protein quantitative trait loci. Furthermore, we describe an analysis pipeline that allows for the detection of a single gene in the locus that is responsible for the differences in protein levels. We identified how variants in the Regulatory Factor X1 (Rfx1) gene regulates Interleukin-17A (IL-17A) plasma levels. Our study reports an innovative approach to identify genetic modifiers that may be targeted for drug development.

16

Expanding the list of sequence-agnostic enzymes for chromatin conformation capture assays with S1 nuclease

Gridina, M.; Popov, A.; Shadskiy, A.; Torgunakov, N.; Kechin, A.; Khrapov, E.; Ryzhkova, O.; Filipenko, M.; Fishman, V.

2023-06-15 molecular biology 10.1101/2023.06.15.545138 medRxiv

Top 0.1%

4.3%

Show abstract

This study presents a novel approach for mapping global chromatin interactions using S1 nuclease, a sequence-agnostic enzyme. We develop and outline a protocol that leverages S1 nucleases ability to effectively introduce breaks into both open and closed chromatin regions, allowing for comprehensive profiling of chromatin properties. Our S1 Hi-C method enables the preparation of high-quality Hi-C libraries, marking a significant advancement over previously established DNase I Hi-C protocols. Moreover, S1 nucleases capability to fragment chromatin to mono-nucleosomes suggests the potential for mapping the three-dimensional organization of the genome at high resolution. This methodology holds promise for an improved understanding of chromatin state-dependent activities and may facilitate the development of new genomic methods.

17

Escherichia coli σ38 promoters use two UP elements instead of a -35 element: resolution of a paradoxand discovery that σ38 transcribes ribosomal promoters

Franco, K. S.; Sun, Z.; Chen, Y.; Cagliero, C.; Zuo, Y.; Zhou, Y. N.; Kashlev, M.; Jin, D.; Schneider, T. D.

2020-02-06 molecular biology 10.1101/2020.02.05.936344 medRxiv

Top 0.1%

4.1%

Show abstract

1In E. coli, one RNA polymerase (RNAP) transcribes all RNA species, and different regulons are transcribed by employing different sigma ({sigma}) factors. RNAP containing{sigma} 38 ({sigma}S) activates genes responding to stress conditions such as stationary phase. The structure of{sigma} 38 promoters has been controversial for more than two decades. To construct a model of{sigma} 38 promoters using information theory, we aligned proven transcriptional start sites to maximize the sequence information, in bits, and identified a -10 element similar to{sigma} 70 promoters. We could not align any -35 sequence logo; instead we found two patterns upstream of the -35 region. These patterns have dyad symmetry sequences and correspond to the location of UP elements in ribosomal RNA (rRNA) promoters. Additionally the UP element dyad symmetry suggests that the two polymerase subunits, which bind to the UPs, should have two-fold dyad axis of symmetry on the polymerase and this is indeed observed in an X-ray crystal structure. Curiously the CTDs should compete for overlapping UP elements. In vitro experiments confirm that{sigma} 38 recognizes the rrnB P1 promoter, requires a -10, UP elements and no -35. This clarifies the long-standing paradox of how{sigma} 38 promoters differ from those of{sigma} 70.

18

Simultaneous discovery of candidate imprinted genes and Imprinting Control Regions in the mouse genome

Bina, M.; Wyss, P.

2019-09-24 genomics 10.1101/780551 medRxiv

Top 0.1%

4.0%

Show abstract

In mammals, parent-of-origin-specific gene expression is regulated by specific genomic DNA segments known as Imprinting Control Regions (ICRs) and germline Differentially Methylated Regions (gDMRs). In the mouse genome, the known ICRs/gDMRs often include clusters of a set of composite-DNA-elements known as ZFBS-morph overlaps. These elements consist of the ZFP57 binding site (ZFBS) overlapping a subset of the MLL1 morphemes. To improve detection of such clusters, we created density-plots. In genome-wide analyses, peaks in these plots pinpointed [~]90% of the known ICRs/gDMRs and located candidate ICRs within relatively long genomic DNA sections. In several cases, the candidate ICRs mapped to chromatin boundaries, to a subset of gene-transcripts, or to both. By viewing the plots at the UCSC genome browser, we could examine the candidate ICRs in the context of the genes in their vicinity. This strategy uncovered several potential imprinted genes with a broad range of physiologically important functions. Examples include: folliculogenesis; lineage commitment of murine embryonic stem cells; the development of the junctional zone of the placenta; left-right patterning of the body axis; the development of the neocortex, hippocampus, and cerebellum; postnatal vision; self-renewal of mouse spermatogonial stem cells; and histone-to-protamine replacement during spermatogenesis.

19

Contribution of DNA breathing to physical interactions with transcription factors

Butt, W. A.; Lai, B.; Chiu, T.-P.; Bhattarai, M.; Qian, S.; Bishop, A. R.; Duan, J.; Alexandrov, B. S.; Rohs, R.; He, X.

2025-01-22 genetics 10.1101/2025.01.20.633840 medRxiv

Top 0.1%

4.0%

Show abstract

Interaction between transcription factors (TFs) and DNA plays a key role in regulating gene expression. It is generally believed that these interactions are controlled through recognition of DNA core motifs by TFs. Nevertheless, several studies pointed out the limitation of this view, in particular, DNA sequence variants influencing TF binding are often located outside of core motifs. One possible explanation is that the physical properties of DNA may play a role in TF-DNA interactions. Recent studies have supported the importance of DNA shape features, especially in flanking regions of core motifs. Another important physical property of DNA is DNA breathing, the spontaneous opening of double-stranded DNA through thermal motions. But there have been few genomic studies of the role of DNA breathing in TF-DNA interactions. In this work, we analyzed in vitro TF-DNA binding data of three TFs and found that DNA breathing features inside or near core motifs are correlated with binding affinity. This suggests that these TFs may prefer locally and temporally melted DNA formed through breathing. We extended the analysis to 44 TFs with in vivo ChIP-seq binding data. We found that for a large proportion of TFs, their breathing features in or near core motifs are associated with binding, but the sign and magnitude of these associations vary substantially across TF families. Altogether, our study supports the hypothesis that DNA breathing features near binding motifs contribute to TF-DNA interactions. Author SummaryProper regulation of when and where genes are expressed is crucial to biological development and function. This process is largely controlled by interaction of transcription factors (TFs) with DNA sequences. The recognition of specific DNA sequences by TFs is important to ensure that only the correct genes are activated. Extensive work has shown that TFs prefer to bind certain DNA sequence patterns of 6-20 bp, known as motifs. However, the structure of DNA molecules may also play a role. In this work, we explored the role of DNA breathing, which refers to spontaneous opening of double strand DNA due to thermal motions. This process creates transient, single-strand "bubbles" in DNA. Through examining TF-DNA binding data of >60 TFs, we found that the propensity of DNA forming bubbles near motifs is often associated with binding affinity of DNA sequence. Interestingly the patterns of these associations seem to vary with TFs. Altogether, our results highlighted the potential of DNA breathing in influencing TF-DNA interactions.

20

Primate deep conserved noncoding sequences and non-coding RNA: their possible relatedness to brain and Central Nervous System

Hettiarachchi, N.

2021-08-17 evolutionary biology 10.1101/2021.08.17.456625 medRxiv

Top 0.1%

4.0%

Show abstract

BackgroundConserved non coding Sequences (CNSs) are extensively studied for their regulatory properties and functional importance to organisms. Many features such as location, proximity to the likely target gene, lineage specificity, functionality of likely target genes, and nucleotide composition of these sequences have been investigated, thus have provided very meaningful insight to signify underlying evolutionary importance of these elements. Also thorough investigation around how to assign function to non-coding regions of eukaryote genomes is another area that is studied. On one hand evolutionary analyses, including signatures of selection or conservation which can indicate the presence of constraint, suggesting that sequences that are evolving non-neutrally are candidates for functionality. On the other hand evidence that is based on experimental profiling of transcription, methylation, histone modifications and chromatin state. While these types of data are very important and are associated with function in most cases, this is not always the case. Evolutionary conservation though highly conservative which mostly considers elements identifiable in more than one species, is still being used as the initial guideline in investigating function via experiments. If we had an understanding of the experimental profiles of conserved non-coding regions as there may be patterns that are often associated these potentially functional elements it may help to construed functionality of conserved non coding regions easily. ResultsIn an effort to try integrate experimental profile data, we investigated evidence of expression of conserved noncoding sequences (CNSs). For CNSs from ten primates, we assessed transcription, histone modifications, level of evolutionary constraint or accelerated evolution, and assessed possible target genes, tissue expression profiles of likely target genes (as some CNSs may be enhancers, and may be ncRNAs that interact directly with mRNA) and clustering patterns of CNSs. In total we found 153475 CNSs conserved across all ten primates. Of these 59,870 were overlapping non coding regions of ncRNA genes. H3K4Me1 marks (often associated with active enhancers) were highly correlated with CNSs whereas H4K20Me1 (linked to, e.g. DNA damage repair) had high correlation with conserved ncRNA regions (ncRNA-gene-CEs). Both CNSs and conserved ncRNA showed evidence of being under purifying selection. The CNSs in our dataset overall exhibited lower allele frequencies, consistent with higher levels of evolutionary constraint. We also found that CNSs and ncRNA-gene-CEs produce mutually exclusive groups. The analyses also suggest that both types of conserved elements have undergone waves of accelerated evolution, which we speculate may indicate changes in regulatory requirements following divergence events. Finally, we find that likely target genes for hominoidae, primate and mammalian-specific CNSs and ncRNA-gene-CEs are predominantly associated with brain-related function in humans. ConclusionThe deep conserved primate CNSs and ncRNA gene-CEs signify functional importance suggesting ongoing recruitment of these elements into brain-related functions, consistent with King and Wilsons hypothesis that regulatory changes may account for rapid changes in phenotype among primates.